Phase 1c: Bias-corrected local-linear CI (CCT 2014) by igerber · Pull Request #340 · igerber/diff-diff

igerber · 2026-04-19T23:17:56Z

Summary

Port nprobust's lprobust() single-eval-point path (lprobust.R:177-246) into diff_diff/_nprobust_port.py as lprobust() + LprobustResult. Uses the Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix Q.q to compute classical and bias-corrected point estimates along with naive and robust standard errors in a single pass.
Add public wrapper bias_corrected_local_linear() + BiasCorrectedFit in diff_diff/local_linear.py. Returns the μ-scale CI [tau.bc ± z_{1-α/2} * se.rb] for paper Equation 8. Auto-bandwidth path delegates to mse_optimal_bandwidth and matches nprobust's rho=1 default (b = h).
Refactor: extract shared _validate_had_inputs helper so Phase 1b and Phase 1c enforce identical HAD-scope input rules (empty/non-finite/negative-dose/off-support-boundary/mass-point/Design 1' plausibility).
Golden generator benchmarks/R/generate_nprobust_lprobust_golden.R + JSON on 5 DGPs (uniform, Beta(2,2), half-normal, clustered, shifted-boundary); R's z = qnorm(1-α/2) exported so Python consumes the exact critical value, eliminating ppf/qnorm ULP drift on CI bounds.

Methodology references (required if estimator / math changes)

Method name(s): Calonico-Cattaneo-Titiunik (2014) robust bias correction for local-polynomial regression; Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth (consumed from Phase 1b); de Chaisemartin, Ciccia, D'Haultfoeuille, Knau (2026) Equation 8 (mu-scale CI).
Paper / source link(s): paper review at docs/methodology/papers/dechaisemartin-2026-review.md; nprobust 0.5.0 (SHA 36e4e53), R/nprobust/R/lprobust.R:177-246.
Any intentional deviations from the source (and why): (a) vce in {"hc2","hc2_bm"} combined with cluster= raises ValueError; nprobust silently accepts the combination but it is not a well-defined estimator (hc2/hc3 assume independent observations). (b) Auto mode uses b = h (matching nprobust's rho=1 default) rather than Phase 1b's distinct b_mse; the paper itself uses a single h*_G throughout Equation 8, and b_mse is still surfaced via bandwidth_diagnostics for inspection. (c) Clustered golden DGP 4 uses manual h=b=0.3 to sidestep an nprobust-internal singleton-cluster shape bug in lpbwselect.mse.dpi's pilot fits; the Python port has no equivalent bug. All three deviations are documented in docs/methodology/REGISTRY.md under the Phase 1c entry.

Validation

Tests added/updated: tests/test_nprobust_port.py::TestLprobustSingleEval (8 port-level tests exercising the CCT-2014 math with R-supplied bandwidths); tests/test_bias_corrected_lprobust.py (29 wrapper-level tests: parity × 5 DGPs, CI behavior, full input contract, parameter interactions, end-to-end with Phase 1b selector, validator idempotence). Parity at tiered tolerances per docs/methodology/REGISTRY.md Phase 1c entry: atol=1e-12 on tau/se outputs, atol=1e-13 on CI bounds (R's z exported in golden JSON); clustered DGP 4 hits bit-parity at atol=1e-14.
Backtest / simulation / notebook evidence (if applicable): N/A - estimator-math PR without user-facing tutorials. Phase 2 (estimator class) will add notebook evidence.

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

Port nprobust::lprobust's single-eval-point path (lprobust.R:177-246) as the foundation for paper Equation 8: - diff_diff/_nprobust_port.py: add `lprobust()` + `LprobustResult`. Uses the Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q` to produce classical and bias-corrected point estimates along with naive and robust (CCT 2014) standard errors in a single pass. - diff_diff/local_linear.py: add `bias_corrected_local_linear()` + `BiasCorrectedFit`. Public wrapper returns the mu-scale CI `[tau.bc +/- z_{1-alpha/2} * se.rb]`. Auto-bandwidth path delegates to `mse_optimal_bandwidth` and honors nprobust's rho=1 default (b = h). Also extracts shared `_validate_had_inputs` helper from Phase 1b. - benchmarks/R/generate_nprobust_lprobust_golden.R + golden JSON: 5 DGPs (Uniform, Beta(2,2), half-normal, clustered, shifted-boundary); R's z = qnorm(1-alpha/2) exported so Python skips ppf and matches bit-wise on CI arithmetic. - Tests: TestLprobustSingleEval (8 port-level) + test_bias_corrected_lprobust (29 wrapper-level). Tiered tolerances per plan: 1e-12 on tau/se, 1e-13 on CI bounds; clustered DGP 4 hits bit-parity (1e-14). Known deviations from nprobust (in REGISTRY.md): hc2/hc3 + cluster raises (nprobust silently accepts); clustered DGP 4 uses manual h=b=0.3 to sidestep an nprobust-internal singleton-cluster bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-19T23:25:20Z

Overall Assessment

⛔ Blocker — one P0 inference bug and one P1 methodology/parameter-propagation bug remain in the new Phase 1c path.

Executive Summary

The new lprobust port and wrapper mostly line up with the Phase 1c registry entry, and the documented deviations (b = h in auto mode, rejecting hc2/hc3 + cluster, clustered DGP 4 workaround) are correctly documented and therefore non-blocking.
P0: bias_corrected_local_linear() builds CI bounds directly from tau_bc ± z * se_rb with no NaN-safe gate, so degenerate exact-fit cases can return finite or infinite CIs when se_robust is invalid.
P1: the auto-bandwidth path silently ignores user cluster, vce, and nnmatch settings. It always selects h/b with the Phase 1b unclustered vce="nn", nnmatch=3 selector, then applies the requested settings only in the final lprobust() call.
The new tests cover parity and many input-contract branches, but they do not exercise the broken auto-parameter interactions or zero-SE CI handling.
The TODO items added in this PR are appropriately tracked tech debt and are not blockers.
I reviewed this statically from the diff and surrounding code; I did not run the suite in this sandbox because the Python environment here is missing numpy.

Methodology

Severity: P0. Impact: The new Equation 8 CI path returns misleading inference when se_robust is zero or non-finite. lprobust() can legitimately produce se_rb == 0 in exact-fit cases because the residual vector can collapse to zero before the sandwich step, and the wrapper then returns a finite zero-width CI instead of (NaN, NaN). That violates the repo’s established inference contract for invalid SEs and is silent bad statistical output. See diff_diff/_nprobust_port.py:L1257-L1271, diff_diff/local_linear.py:L1128-L1135, and diff_diff/utils.py:L175-L209. Concrete fix: derive the CI through safe_inference(result.tau_bc, result.se_rb, alpha=alpha)[2] or an equivalent all-or-nothing NaN gate, and add regression coverage for constant-y / exact-fit and other invalid-SE cases.
Severity: P1. Impact: The affected methods are the CCT (2014) robust bias-corrected CI plus the CCF (2018) auto-bandwidth stage. In auto mode, bias_corrected_local_linear() calls mse_optimal_bandwidth() without forwarding cluster, vce, or nnmatch, while that selector is hard-coded to cluster=None, vce="nn", and nnmatch=3. The later lprobust() call does use the user-requested settings, so clustered fits or non-nn / non-default-nnmatch fits are using bandwidths chosen for a different estimator than the one reported. That is an undocumented methodology deviation and an incomplete parameter-propagation bug. See diff_diff/local_linear.py:L1078-L1086, diff_diff/local_linear.py:L1112-L1125, and diff_diff/local_linear.py:L797-L810. Concrete fix: in auto mode, call diff_diff._nprobust_port.lpbwselect_mse_dpi directly with cluster=cluster_arr, vce=vce, and nnmatch=nnmatch, or explicitly reject unsupported auto combinations until that threading is implemented and validated.
Severity: P3. Impact: The b = h auto-mode choice, the hc2/hc3 + cluster rejection, and the manual clustered DGP 4 workaround are explicitly documented in the methodology registry, so they are not defects for this PR. See docs/methodology/REGISTRY.md:L2315-L2317. Concrete fix: none.

Code Quality

No additional findings beyond the inference bug above.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3. Impact: The new TODO rows correctly track deferred parity expansion for additional kernels / HC VCE modes plus later weights= / multi-eval support and clustered-auto parity work, so those items are non-blocking deferred work rather than review blockers. See TODO.md:L89-L93. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: The new suite does not cover the two failing surfaces above. It tests clustered behavior only with manual h/b, and the auto end-to-end test uses only the default unclustered vce="nn", nnmatch=3 path; there is also no regression test for zero/non-finite se_robust CI propagation. That left both the auto-parameter mismatch and the invalid-SE CI bug unguarded. See tests/test_bias_corrected_lprobust.py:L333-L387. Concrete fix: add tests for cluster+auto, vce="hc1"+auto, nnmatch != 3 + auto, and exact-fit/constant-y cases that should yield ci_low = ci_high = NaN.

Path to Approval

Make the Phase 1c CI construction NaN-safe so any se_robust <= 0 or non-finite se_robust yields (NaN, NaN) rather than a finite or infinite CI, and add a regression test that exercises that path.
Fix auto-bandwidth propagation so cluster, vce, and nnmatch are honored during bandwidth selection, or reject unsupported auto combinations explicitly. Add regression tests for cluster+auto, non-nn vce+auto, and non-default nnmatch+auto.

P0: bias_corrected_local_linear now routes the CI through `safe_inference()` so degenerate cases with `se_robust <= 0` or non-finite `se_robust` (e.g., exact-fit / constant-y) return `(NaN, NaN)` rather than a misleading zero-width or infinite CI. Matches the repo-wide inference contract (CLAUDE.md Key Design Pattern #6). P1: Auto-bandwidth path now calls `lpbwselect_mse_dpi` directly with `cluster`, `vce`, and `nnmatch` forwarded. Previously it went through `mse_optimal_bandwidth` which hard-codes unclustered / nn / nnmatch=3, silently mismatching the downstream `lprobust` fit's reported estimator. Tests added: TestNaNSafeCI (constant-y + near-zero-SE) and TestAutoBandwidthForwardsParameters (cluster+auto, vce='hc1'+auto, nnmatch=5+auto), all asserting the selected bandwidth changes when the corresponding parameter changes (catches silent fallback). Also: suppress spurious BLAS FPE warnings in lprobust_bw's hc/hc2/hc3 branch (numpy issue #21432 pattern), newly reachable via the wired-through vce='hc1' auto-bandwidth path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T23:35:05Z

/ai-review

github-actions · 2026-04-19T23:42:21Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 237a80098e62b21c956ec5db755f5bcfc04ccbc0

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains in the new clustered Phase 1c manual-bandwidth path.

Executive Summary

P1: The new clustered bias_corrected_local_linear(..., h=..., cluster=...) / lprobust(..., cluster=...) path only rejects missing cluster IDs for floating dtypes. Object-dtype None / np.nan can bypass validation and then contaminate clustered variance aggregation.
The two blocker findings from the previous AI review are fixed: auto mode now forwards cluster / vce / nnmatch into the selector, and CI construction now goes through safe_inference. See diff_diff/local_linear.py:L1077-L1103, diff_diff/local_linear.py:L1171-L1185, and diff_diff/utils.py:L175-L209.
The Phase 1c deviations called out in the PR (b = h in auto mode, rejecting hc2/hc3 + cluster, and the clustered DGP 4 workaround) are documented in the methodology registry, so they are non-blocking P3 informational items. See docs/methodology/REGISTRY.md:L2316-L2317.
The new TODO rows correctly track deferrable follow-up work around additional parity coverage and out-of-scope extensions. See TODO.md:L89-L93.
Review was static only; I could not run the suite in this sandbox because numpy is unavailable here.

Methodology

Severity: P1
Impact: The affected method is the clustered CCT (2014) robust bias-corrected local-linear variance path. The new wrapper and new lprobust port only reject missing cluster IDs when cluster.dtype is floating, but the clustered sandwich later groups on np.unique(cluster). That means object-dtype clusters carrying None or object-np.nan can either be treated as real clusters or fail later with a non-targeted error, instead of honoring the documented “reject missing cluster IDs” contract. On the silent branch, that can misstate clustered se_cl / se_rb with no warning. See diff_diff/local_linear.py:L1053-L1065, diff_diff/_nprobust_port.py:L1127-L1139, and the clustered aggregation at diff_diff/_nprobust_port.py:L341-L350. The selector path already has the correct cross-dtype missing-ID guard at diff_diff/_nprobust_port.py:L675-L713.
Concrete fix: Reuse the selector’s has_missing logic in both bias_corrected_local_linear and lprobust, so None and object-dtype np.nan are rejected before any clustered variance computation.
Severity: P3
Impact: The documented Phase 1c deviations/workarounds are properly recorded in the registry and therefore are not defects for this PR. See docs/methodology/REGISTRY.md:L2316-L2317.
Concrete fix: None.

Code Quality

No additional findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3
Impact: The remaining Phase 1c follow-ups are explicitly tracked in TODO.md rather than left implicit: additional kernel parity, HC-mode parity, weights=, multi-eval support, and clustered-auto parity once the upstream nprobust issue is resolved. See TODO.md:L89-L93.
Concrete fix: None in this PR.

Security

No findings.

Documentation/Tests

Severity: P2
Impact: The new regression suite only checks float NaN cluster rejection on the manual-bandwidth wrapper path, so it misses the object-dtype missing-ID case that allows the P1 above through. See tests/test_bias_corrected_lprobust.py:L313-L319.
Concrete fix: Add wrapper-level and direct-port tests using cluster=np.array([...], dtype=object) with None and object-np.nan, asserting a targeted ValueError.

Path to Approval

Extend missing-cluster validation in bias_corrected_local_linear and diff_diff._nprobust_port.lprobust to reject None / object-dtype np.nan, not just floating NaN/Inf.
Add regression coverage for the manual-bandwidth wrapper path and direct lprobust path on object-dtype cluster arrays with missing sentinels, and assert the targeted validation error before clustered variance computation.

CI review follow-up: the floating-dtype-only missing-ID guard in `bias_corrected_local_linear` and `lprobust` let object-dtype arrays with `None` / object `np.nan` sentinels bypass validation. The downstream `lprobust_vce` cluster loop would then group on `np.unique`, treating the sentinel as a real cluster and silently misstating clustered SE. Extract the dtype-agnostic `_cluster_has_missing` helper already used inside `lpbwselect_mse_dpi` and apply it at all three entry points: the selector, the port-level `lprobust`, and the public `bias_corrected_local_linear` wrapper. Regression tests: object-dtype cluster arrays with None and with np.nan sentinels raise a targeted ValueError at both the wrapper (test_bias_corrected_lprobust) and the port (test_nprobust_port) entry points. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T23:44:35Z

/ai-review

github-actions · 2026-04-19T23:58:28Z

🔁 AI review rerun (requested by @igerber)

Head SHA: bee6805c663aef540fce95cda7152b9708181bd0

Overall Assessment

✅ Looks good — this re-review resolves the prior P1 on object-dtype missing cluster IDs, keeps the earlier CI/inference fix in place, and I did not find any new unmitigated P0/P1 issues in the changed files. Review was static only because this sandbox does not have pytest or numpy.

Executive Summary

The affected methods are the CCT (2014) robust bias-corrected local-polynomial estimator, the CCF (2018) MSE-DPI selector used in auto mode, and the de Chaisemartin et al. (2026) Equation 8 μ-scale CI wrapper; the implementation is source-mapped and consistent with the Phase 1c registry entry in docs/methodology/REGISTRY.md:L2317-L2318, diff_diff/_nprobust_port.py:L1016-L1305, and diff_diff/local_linear.py:L934-L1211.
The previous blocker on missing cluster IDs is fixed: the new dtype-agnostic _cluster_has_missing() is used in the selector, direct port, and public wrapper, and the tests now cover float NaN, object None, and object np.nan sentinels in both surfaces at diff_diff/_nprobust_port.py:L87-L111, diff_diff/_nprobust_port.py:L702-L724, diff_diff/_nprobust_port.py:L1138-L1154, diff_diff/local_linear.py:L1053-L1073, tests/test_bias_corrected_lprobust.py:L313-L342, and tests/test_nprobust_port.py:L591-L620.
The earlier inference/CI issue is still fixed correctly: the wrapper now computes CI bounds through safe_inference(), so degenerate se_robust values propagate to NaN CI endpoints instead of misleading finite output at diff_diff/local_linear.py:L1179-L1193 and diff_diff/utils.py:L175-L209.
Auto-bandwidth mode now forwards cluster, vce, and nnmatch into lpbwselect_mse_dpi, so the selector and final lprobust fit use the same variance specification at diff_diff/local_linear.py:L1085-L1111, with targeted coverage in tests/test_bias_corrected_lprobust.py:L458-L518.
The three Phase 1c deviations called out in the PR are documented in the methodology registry, so they are non-blocking informational items: b = h in auto mode, rejecting cluster with vce in {"hc2","hc3"}, and using manual h=b=0.3 for the clustered golden DGP4 at docs/methodology/REGISTRY.md:L2317-L2318.

Methodology

Affected methods: Calonico-Cattaneo-Titiunik (2014) robust bias correction via lprobust, Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selection in auto mode, and the de Chaisemartin et al. (2026) Equation 8 μ-scale CI wrapper. I did not find any unmitigated methodology defects in the changed implementation; the estimator logic is source-mapped in diff_diff/_nprobust_port.py:L1038-L1067, and the wrapper behavior matches the registry notes in docs/methodology/REGISTRY.md:L2317-L2318.

P3 Impact: The Phase 1c non-R behaviors are explicitly documented in the registry, so they are informational rather than defects: auto mode uses b = h, cluster + hc2/hc3 is rejected at the wrapper, and clustered DGP4 uses manual bandwidths. Concrete fix: None. See docs/methodology/REGISTRY.md:L2317-L2318 and the matching implementation in diff_diff/local_linear.py:L1038-L1140.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 Impact: Remaining Phase 1c coverage gaps are explicitly tracked in TODO.md for non-Epanechnikov kernel parity, HC-mode parity, weights, multi-eval support, and clustered auto-bandwidth parity. Because these are tracked deferred items rather than silent correctness bugs, they are non-blocking. Concrete fix: None in this PR; follow the tracked TODO items. See TODO.md:L89-L93.

Security

No findings.

Documentation/Tests

P3 Impact: The developer-facing docstrings still say auto mode chooses both h and b via mse_optimal_bandwidth, but the implementation now calls lpbwselect_mse_dpi directly and then sets b = h to preserve rho=1. That can mislead readers about how cluster, vce, and nnmatch affect auto mode. Concrete fix: Update the BiasCorrectedFit and bias_corrected_local_linear docstrings to describe the direct selector call, the b = h rule, and the fact that b_mse is surfaced only in diagnostics. See diff_diff/local_linear.py:L891-L897, diff_diff/local_linear.py:L975-L980, and diff_diff/local_linear.py:L1085-L1140.
Review note: I could not execute the new tests here because pytest is not installed and numpy is unavailable in the sandbox. The added coverage is well targeted on the previously flagged surfaces: missing-cluster sentinels, NaN-safe CI behavior, and auto-bandwidth parameter forwarding in tests/test_bias_corrected_lprobust.py:L313-L342, tests/test_bias_corrected_lprobust.py:L430-L450, tests/test_bias_corrected_lprobust.py:L458-L518, and tests/test_nprobust_port.py:L591-L620.

Docstring note from AI review. The wrapper's docstring still said auto mode chooses both h and b via `mse_optimal_bandwidth`, but the implementation now calls `_nprobust_port.lpbwselect_mse_dpi` directly (to forward cluster / vce / nnmatch) and then sets `b = h` per nprobust's rho=1 default. Update the docstrings to reflect that. Behavior is unchanged; this is a comment-only clarification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-20T00:34:04Z

/ai-review

github-actions · 2026-04-20T00:51:13Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7672f8d3b2dbb7b2a32afa6e793b411367e40085

Overall Assessment

✅ Looks good — this re-review resolves the prior P1 surfaces I checked, and I did not find any unmitigated P0/P1 issues in the changed files. The only remaining item is a P3 documentation/test-contract mismatch. This was a static review only because the sandbox does not have pytest or scipy.

Executive Summary

The affected methods are the CCT (2014) robust bias-corrected local-polynomial estimator, the CCF (2018) MSE-DPI selector used in auto mode, and the de Chaisemartin et al. (2026) Equation 8 μ-scale CI wrapper.
The earlier missing-cluster-ID issue appears fixed across the wrapper, selector, and direct port via the shared dtype-agnostic _cluster_has_missing() guard.
The earlier NaN/CI issue also appears fixed: bias_corrected_local_linear() now routes CI construction through safe_inference(), so invalid or zero se_robust values should propagate to NaN CI endpoints instead of misleading finite output.
Auto-bandwidth mode now forwards cluster, vce, and nnmatch directly into lpbwselect_mse_dpi(), avoiding a selector/fit methodology mismatch.
The three non-R Phase 1c behaviors called out in the PR are documented in the Methodology Registry, so they are informational rather than defects under the review rules.
One P3 remains: several new docs/comments say Python “consumes R’s z directly,” but the changed code still computes production CI through SciPy-based safe_inference(), and the parity tests compare stored CI bounds rather than reconstructing from the exported z.

Methodology

P3 Impact: The only non-R behaviors I found are already documented in the registry: auto mode uses b = h, the public wrapper rejects cluster with vce in {"hc2","hc3"}, and clustered DGP 4 uses manual h=b=0.3. Under the review rules these are informational, not defects. Concrete fix: None. References: docs/methodology/REGISTRY.md:L2317-L2318, diff_diff/local_linear.py:L1048-L1101, tests/test_bias_corrected_lprobust.py:L370-L385

I did not find any unmitigated P0/P1 methodology defects in the changed estimator path. The prior cluster-missing, NaN-safe CI, and auto-forwarding findings appear addressed in diff_diff/_nprobust_port.py:L87-L111, diff_diff/_nprobust_port.py:L1138-L1154, diff_diff/local_linear.py:L1063-L1083, diff_diff/local_linear.py:L1189-L1203, diff_diff/local_linear.py:L1103-L1121, tests/test_bias_corrected_lprobust.py:L313-L342, tests/test_bias_corrected_lprobust.py:L430-L518, and tests/test_nprobust_port.py:L591-L620.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 Impact: The remaining Phase 1c coverage gaps I would normally worry about here are explicitly tracked in TODO.md: non-Epanechnikov kernel parity, HC-mode parity, weights=, multi-eval support, and clustered auto-bandwidth parity. Under the review rules these tracked items are non-blocking. Concrete fix: None in this PR; follow the tracked TODO items. References: TODO.md:L89-L93

Security

No findings.

Documentation/Tests

P3 Impact: The registry, generator metadata, and wrapper comments say the Python side “consumes R’s z directly,” but the actual wrapper builds CIs through safe_inference(), and the new parity tests compare against stored ci_low/ci_high rather than using the exported z. This does not change current outputs, but it overstates what the parity harness is actually eliminating and makes the CI-tolerance story harder to audit. Concrete fix: Either thread the exported z into the parity harness, or revise the registry/comments to say z is stored for audit/reference while production CI still uses SciPy through safe_inference(). References: docs/methodology/REGISTRY.md:L2318, benchmarks/R/generate_nprobust_lprobust_golden.R:L61-L77, benchmarks/R/generate_nprobust_lprobust_golden.R:L124-L127, diff_diff/local_linear.py:L1022-L1028, diff_diff/local_linear.py:L1189-L1203, tests/test_bias_corrected_lprobust.py:L123-L172

Review note: I could not execute the new tests in this sandbox because pytest is not installed, and I also could not import SciPy here.

…tion) P3 doc-accuracy note from AI review: the registry, R generator, and local_linear.py comments said the Python side "consumes R's z directly" but the wrapper actually computes its own z via scipy.stats.norm.ppf inside safe_inference(). The golden JSON's z field is an audit/reference export so a reviewer can confirm R's qnorm and Python's scipy.stats.norm.ppf agree to machine precision. The parity harness compares Python-computed CI bounds to R-computed CI bounds (not R's z fed into Python's arithmetic). Behavior is unchanged; only doc phrasing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-20T10:39:44Z

/ai-review

github-actions · 2026-04-20T10:46:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 12a8efea071535d7dd3798ca0bf8981af9980c4f

Overall Assessment

✅ Looks good — I did not find any unmitigated P0/P1 issues in the changed files. The affected methods are the CCT (2014) robust bias-corrected local-polynomial estimator, the CCF (2018) MSE-DPI selector used in auto mode, and the de Chaisemartin et al. (2026) Equation 8 μ-scale CI wrapper as recorded in docs/methodology/REGISTRY.md:L2316-L2318. Prior re-review concerns around missing cluster IDs, NaN-safe CI construction, and auto-bandwidth parameter forwarding appear resolved in diff_diff/_nprobust_port.py:L87-L111, diff_diff/local_linear.py:L1063-L1120, diff_diff/local_linear.py:L1189-L1203, and diff_diff/utils.py:L175-L205.

Executive Summary

The changed estimator path is source-mapped to nprobust::lprobust single-eval logic and aligns with the Phase 1c registry entry in docs/methodology/REGISTRY.md:L2317-L2318.
The three non-R behaviors in scope are documented and correctly implemented, so they are informational rather than defects: b = h in auto mode, rejecting cluster with vce in {"hc2","hc3"}, and manual h=b=0.3 for the clustered golden DGP.
The earlier cluster-missing and NaN/CI issues appear fixed via the shared _cluster_has_missing() guard and safe_inference() CI path in diff_diff/_nprobust_port.py:L87-L111, diff_diff/local_linear.py:L1063-L1083, and diff_diff/local_linear.py:L1189-L1203.
Auto mode now forwards cluster, vce, and nnmatch into the selector, avoiding the prior selector/fit mismatch in diff_diff/local_linear.py:L1095-L1120.
Remaining Phase 1c parity gaps are properly tracked in TODO.md:L89-L93, so they are non-blocking under the review rules.
One P3 remains: the changed parity/audit artifacts are still internally inconsistent about CI critical-value provenance and the exact parity contract.

Review note: this was a static review. I could not execute the new parity tests in this sandbox because the available Python environment is missing numpy.

Methodology

Severity: P3
Impact: The only deviations from source behavior I found are already documented in the Methodology Registry and correctly reflected in code, so they are informational rather than defects: auto mode sets b = h, cluster with vce in {"hc2","hc3"} raises, and clustered DGP 4 uses manual h=b=0.3. See docs/methodology/REGISTRY.md:L2318, diff_diff/local_linear.py:L1048-L1150, and tests/test_bias_corrected_lprobust.py:L174-L212.
Concrete fix: None.

No unmitigated P0/P1 methodology defects found in the changed estimator path.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3
Impact: Remaining Phase 1c parity/feature gaps are explicitly tracked in TODO.md: alternate kernels, HC vce modes, weights=, multi-eval support, and clustered auto-bandwidth parity. Under the review rules these are non-blocking. See TODO.md:L89-L93.
Concrete fix: None in this PR; follow the tracked TODO items.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: The changed Phase 1c parity story is still internally inconsistent. The public wrapper Notes block still says tau_cl/se_cl are bit-parity and that Python consumes R’s z directly in diff_diff/local_linear.py:L1022-L1028, while the runtime CI path and registry say production CI uses SciPy and stores R’s z only for audit in diff_diff/local_linear.py:L1189-L1200 and docs/methodology/REGISTRY.md:L2318. The committed golden JSON metadata also still repeats the old “consume R’s critical value directly” claim in benchmarks/data/nprobust_lprobust_golden.json:L13, even though the generator now describes the corrected audit-only usage in benchmarks/R/generate_nprobust_lprobust_golden.R:L124-L130. Separately, the wrapper parity tests use 1e-12 for several CI assertions and do not assert CI bounds on DGP 4/5, which is looser/narrower than the registry text implies in tests/test_bias_corrected_lprobust.py:L109-L117 and tests/test_bias_corrected_lprobust.py:L123-L212. This does not change estimator outputs, but it weakens the audit trail and suggests the committed JSON was not regenerated from the current generator.
Concrete fix: Align the wrapper docstring, registry wording, generator metadata, committed JSON metadata, and parity assertions to one exact contract, then regenerate benchmarks/data/nprobust_lprobust_golden.json.

P3 follow-up from AI review. Three small inconsistencies to resolve: 1. `bias_corrected_local_linear` docstring still described tau_cl/se_cl as bit-parity and said Python consumes R's z directly. The actual contract is atol=1e-12 on all four scalars (DGP 1-3) and the wrapper computes its own z via scipy.stats.norm.ppf; R's qnorm is stored in the JSON for audit only. Docstring updated to match. 2. Committed golden JSON metadata still had the old "consume R's critical value directly" string because the generator was edited without regenerating. Regenerated so JSON metadata matches the corrected audit-export wording in the R script. 3. Parity tests for DGP 4 and DGP 5 did not assert CI bounds. Added ci_low / ci_high assertions at the same tolerance as the corresponding se_rb assertion (bit-parity for DGP 4, 1e-12 for DGP 5), so the audit surface matches what the registry states. Behavior unchanged; tests strengthened and docs aligned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-20T11:13:26Z

/ai-review

github-actions · 2026-04-20T11:22:24Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7eeb42ac95be199eb937c4e827f9bb382135b6ac

Overall Assessment
⚠️ Needs changes. I did not find a P0, but there is one unmitigated P1 in the new Phase 1c variance path.

Executive Summary

The prior P3 around CI critical-value provenance looks largely resolved: the registry, wrapper notes, generator, and committed JSON now agree that production CI uses SciPy’s norm.ppf and stores R’s qnorm only for audit.
One P1 methodology issue remains in the CCT (2014) port: lprobust() reuses the p-fit HC2/HC3 leverage for the q-fit robust-bias-correction residual path, which can misstate se_rb and the de Chaisemartin et al. (2026) Equation 8 CI for unclustered vce="hc2" / "hc3".
The Phase 1c deviations that are actually documented in the Methodology Registry are non-blocking: auto mode uses b = h, clustered hc2/hc3 is rejected, and clustered DGP 4 uses manual h=b=0.3.
The new TODO.md entries correctly track deferrable Phase 1c follow-ups and are non-blocking under project policy.
Static review only: this sandbox is missing numpy, so I could not execute the new parity tests.

Methodology

Severity: P1. Impact: Affected methods are the Calonico-Cattaneo-Titiunik (2014) robust bias-corrected local-polynomial estimator in lprobust() and the de Chaisemartin et al. (2026) Equation 8 wrapper that consumes se_rb. In diff_diff/_nprobust_port.py:L1265-L1288, the code computes only one HC2/HC3 leverage vector from the p-fit (R_p, invG_p, W_h) and passes it to both res_h and res_b, even though res_b is built from q-fit predictions (predicts_q). Because diff_diff/_nprobust_port.py:L322-L330 applies HC2/HC3 scaling directly from hii, the robust-bias-correction path is combining q-fit residuals with p-fit leverage. That silently misstates V_Y_bc, se_rb, and therefore the public CI for exposed unclustered vce="hc2" / "hc3" modes in diff_diff/local_linear.py:L953-L965 and diff_diff/local_linear.py:L993-L995. The same module’s bandwidth code already uses stage-matched leverage, which is the pattern missing here: diff_diff/_nprobust_port.py:L480-L500 and diff_diff/_nprobust_port.py:L543-L560. Concrete fix: compute separate hii_p and hii_q, pass hii_p to res_h and hii_q to res_b, and add regression tests for unclustered hc2/hc3 that assert se_rb and CI against R goldens or an equivalent fixed-bandwidth reference.
Severity: P3. Impact: The Phase 1c behaviors that actually deviate from the R reference are documented in the registry, so they are informational rather than defects: auto mode sets b = h, clustered hc2/hc3 is rejected, and clustered DGP 4 uses manual h=b=0.3. See docs/methodology/REGISTRY.md:L2317-L2318. Concrete fix: None.

Code Quality
No findings.

Performance
No findings.

Maintainability
No findings.

Tech Debt

Severity: P3. Impact: The new Phase 1c TODO rows correctly track deferred parity/features for additional kernels, HC VCE goldens, weights=, multi-eval support, and clustered-auto parity, so these are non-blocking under the review rules. They do not mitigate the HC2/HC3 leverage bug above, because incorrect SE math is not deferrable. See TODO.md:L89-L93. Concrete fix: None in this PR.

Security
No findings.

Documentation/Tests

Severity: P3. Impact: The exact CI-tolerance contract is still inconsistent. The registry and generator comment say CI bounds are checked at atol=1e-13, but several wrapper parity assertions use 1e-12 for dgp2, dgp3, and dgp5. This does not change estimator outputs, but it leaves the audit trail looser than the published contract. See docs/methodology/REGISTRY.md:L2318, benchmarks/R/generate_nprobust_lprobust_golden.R:L28-L31, tests/test_bias_corrected_lprobust.py:L153-L172, and tests/test_bias_corrected_lprobust.py:L220-L225. Concrete fix: either tighten those CI assertions to 1e-13 where they truly hold, or relax the documented contract to the actual tested tolerance.

Path to Approval

In diff_diff._nprobust_port.lprobust(), compute a separate q-fit HC2/HC3 leverage vector and use it for res_b; keep the p-fit leverage only on res_h.
Add positive-path tests for unclustered vce="hc2" and vce="hc3" in both tests/test_nprobust_port.py and tests/test_bias_corrected_lprobust.py that assert se_rb, ci_low, and ci_high against deterministic goldens or a direct reference calculation.

Address CI review P1 on HC2/HC3 leverage. nprobust's lprobust reuses the p-fit hat-matrix leverage for the q-fit residual path (lprobust.R:229-241). The Python port matches R exactly, but that R behavior deserves a derivation and a dedicated golden anchor before the public wrapper can advertise CCT-2014 conformance on hc-mode paths. The Phase 1c plan always said hc0-3 were "exposed but not golden-tested"; this commit restricts the public surface to what actually has a parity anchor. Changes: - `bias_corrected_local_linear`: `vce != "nn"` now raises `NotImplementedError` with a pointer to the port-level `diff_diff._nprobust_port.lprobust` for callers who still need the broader surface. Docstring updated. - Tests: swap positive-path hc-mode tests for negative-path raises; drop `test_auto_vce_hc1_returns_finite` (now superseded by `test_auto_vce_hc1_rejected_in_phase_1c`); drop the cluster+hc2/hc3 tests (now covered by the plain hc2/hc3 rejection tests). - REGISTRY.md: swap "Deviation from R" wording for a clearer "public-API surface restriction" note explaining the R-side hii reuse and the deferred derivation. - TODO.md: upgrade the hc-mode expansion TODO from Low to Medium and document the q-fit leverage derivation as the gating work. Also fix the P3 tolerance-doc inconsistency: registry/docstring now say atol=1e-12 on all six scalars across unclustered DGPs (tests actually assert 1e-12, with DGP 1 / DGP 3 landing closer to 1e-13 in practice). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-20T11:34:25Z

/ai-review

github-actions · 2026-04-20T11:39:39Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2c48e60e7efd8524acd69b5aeab939e688fadd89

Overall Assessment

✅ Looks good

Executive Summary

The prior P1 on exposed HC-mode robust-bias-correction SEs is effectively resolved for the public Phase 1c API: bias_corrected_local_linear() now rejects non-"nn" vce, the registry documents that restriction, and HC exposure is explicitly deferred in TODO.md.
On the supported path, the PR matches the paper/registry design: it implements the μ-scale Equation 8 CI, uses safe_inference() for NaN-consistent CI gating, and forwards auto-bandwidth inputs into the same selector the final fit uses.
The shared HAD validator now covers the important boundary, mass-point, negative-dose, and missing-cluster edge cases on both wrapper and port entry points.
Remaining Phase 1c limitations are documented or tracked rather than silently shipped.
Static review only: this environment does not have numpy, scipy, or pytest, so I could not execute the new parity tests.

Methodology

Severity P3 (informational). Impact: The previous HC2/HC3 SE concern is no longer exposed on the advertised Phase 1c surface. The public wrapper hard-restricts vce to "nn", tests pin the rejection behavior, and the registry/TODO explicitly defer public HC exposure until dedicated R goldens and an HC leverage decision exist. The remaining HC path in the private port is documented rather than presented as Phase 1c-conformant. Concrete fix: None required in this PR. diff_diff/local_linear.py:L1056-L1075 diff_diff/_nprobust_port.py:L1261-L1284 docs/methodology/REGISTRY.md:L2317-L2318 TODO.md:L89-L90 tests/test_bias_corrected_lprobust.py:L383-L406
Severity P3 (informational). Impact: The supported estimator path aligns with the paper review’s Equation 8 workflow: the wrapper enforces the Design 1'/continuous-near-d̲ scope and mass-point exclusion, auto mode uses the documented b = h / rho=1 choice, and CI construction goes through safe_inference() so degenerate se_rb values become NaN rather than misleading finite intervals. Concrete fix: None. docs/methodology/papers/dechaisemartin-2026-review.md:L95-L99 docs/methodology/papers/dechaisemartin-2026-review.md:L127-L132 diff_diff/local_linear.py:L502-L670 diff_diff/local_linear.py:L1104-L1169 diff_diff/local_linear.py:L1208-L1224 diff_diff/utils.py:L175-L209 tests/test_bias_corrected_lprobust.py:L310-L355 tests/test_bias_corrected_lprobust.py:L438-L470

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3 (informational). Impact: The remaining Phase 1c follow-ups are properly tracked: extra-kernel parity, public HC-mode exposure, weights=, multi-eval support, and clustered auto-bandwidth parity. Under the stated review policy these tracked items are non-blocking. Concrete fix: None in this PR. TODO.md:L89-L93

Security

No findings.

Documentation/Tests

Severity P3. Impact: The bias_corrected_local_linear() docstring still says HC0-3 “are exposed here,” and its raises section still describes clustered HC-specific ValueErrors, but the implementation now rejects all non-"nn" vce values with NotImplementedError. That leaves the inline API docs out of sync with the actual behavior, the registry, and the tests. Concrete fix: Update the docstring and raises section to say that Phase 1c exposes only vce="nn" and that HC modes remain private/deferred. diff_diff/local_linear.py:L959-L965 diff_diff/local_linear.py:L993-L997 diff_diff/local_linear.py:L1013-L1017 diff_diff/local_linear.py:L1067-L1075 docs/methodology/REGISTRY.md:L2318-L2318 tests/test_bias_corrected_lprobust.py:L383-L406
Severity P3 (informational). Impact: The added tests materially improve coverage of parity, NaN-safe CI, parameter forwarding, and cluster-missingness handling, but I could not run them here because numpy, scipy, and pytest are missing from the environment. Concrete fix: Run tests/test_bias_corrected_lprobust.py and tests/test_nprobust_port.py in a full Python environment if runtime confirmation is desired.

P3 follow-up from AI review. The class-level scope note still claimed HC0-3 were "exposed here", and the Raises section listed the old `ValueError` for `vce='hc2'/'hc3'` with cluster. Both are stale: the wrapper now raises `NotImplementedError` for any `vce != "nn"` regardless of cluster. Updated the scope note and Raises section to reflect the current behavior. Behavior unchanged; docstring-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-20T11:44:27Z

/ai-review

github-actions · 2026-04-20T11:56:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5cd60516a9d9b6288ebe26ae2f80c2a3bc2b9d3d

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review’s prior documentation mismatch is resolved, and the remaining Phase 1c limitations are either documented in the Methodology Registry or explicitly tracked in TODO.md.

Executive Summary

The prior re-review item is resolved: the public bias_corrected_local_linear() docstring, raises section, implementation, registry entry, and rejection tests now all consistently state that Phase 1c exposes only vce="nn". diff_diff/local_linear.py:938 docs/methodology/REGISTRY.md:2318 tests/test_bias_corrected_lprobust.py:383
The affected methods are diff_diff.local_linear.bias_corrected_local_linear(), diff_diff.local_linear._validate_had_inputs(), and diff_diff._nprobust_port.lprobust(). Their source mapping matches the documented CCT-2014 single-eval lprobust path and the mu-scale Equation 8 workflow in the methodology review. diff_diff/local_linear.py:502 diff_diff/local_linear.py:938 diff_diff/_nprobust_port.py:1016 docs/methodology/papers/dechaisemartin-2026-review.md:95
NaN-safe inference is handled correctly on the new public path: the CI goes through safe_inference(), so degenerate se_robust values produce NaN CI bounds instead of misleading finite intervals. diff_diff/local_linear.py:1212 diff_diff/utils.py:175 tests/test_bias_corrected_lprobust.py:438
Remaining Phase 1c gaps are tracked rather than silently shipped: non-Epanechnikov parity, public HC-mode exposure, weights=, multi-eval support, and clustered auto-bandwidth parity. TODO.md:89
Runtime execution was not possible here because numpy, scipy, and pytest are not installed.

Methodology

Severity P3 (informational). Impact: The only meaningful deviations I found from the cited source material are documented and therefore mitigated under the review policy: the public wrapper is restricted to vce="nn", auto mode forces b = h (rho=1), and clustered parity DGP 4 uses manual h=b=0.3 because of an upstream nprobust pilot-fit bug. Concrete fix: None in this PR. diff_diff/local_linear.py:1060 diff_diff/local_linear.py:1108 docs/methodology/REGISTRY.md:2318 TODO.md:90 benchmarks/R/generate_nprobust_lprobust_golden.R:42
Severity P3 (informational). Impact: The changed estimator path is source-aligned: _validate_had_inputs() enforces the documented HAD scope, the wrapper returns mu-scale Equation 8 ingredients only, auto bandwidth selection forwards estimator-relevant arguments into the same selector used for the final fit, and lprobust() implements the R-mapped active-window, Q.q, and variance formulas for CCT robust bias correction. Concrete fix: None. diff_diff/local_linear.py:564 diff_diff/local_linear.py:853 diff_diff/local_linear.py:1118 diff_diff/_nprobust_port.py:1038 diff_diff/_nprobust_port.py:1175 diff_diff/_nprobust_port.py:1232 docs/methodology/papers/dechaisemartin-2026-review.md:127

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3 (informational). Impact: The remaining Phase 1c limitations are explicitly tracked in TODO.md, so they are non-blocking per policy. Concrete fix: None in this PR; follow the tracked items for triangular/uniform kernel parity, public HC exposure, weights=, multi-eval support, and clustered auto-bandwidth parity. TODO.md:89

Security

No findings.

Documentation/Tests

Severity P3 (informational). Impact: The previous AI-review documentation mismatch is resolved; the public docstring and raises section now match the implementation, registry, and rejection tests for non-"nn" vce. Concrete fix: None. diff_diff/local_linear.py:959 diff_diff/local_linear.py:995 diff_diff/local_linear.py:1013 diff_diff/local_linear.py:1071 tests/test_bias_corrected_lprobust.py:383
Severity P3 (informational). Impact: The new tests materially cover parity, cluster-missingness rejection, NaN-safe CI gating, and auto-bandwidth parameter forwarding, but I could not execute them in this environment because numpy, scipy, and pytest are absent. Concrete fix: Run the targeted Phase 1c test files in a full Python environment. tests/test_bias_corrected_lprobust.py:174 tests/test_bias_corrected_lprobust.py:438 tests/test_bias_corrected_lprobust.py:478 tests/test_nprobust_port.py:454

igerber added the ready-for-ci Triggers CI test workflows label Apr 20, 2026

igerber merged commit 9c787f0 into main Apr 20, 2026
23 of 24 checks passed

igerber deleted the did-no-untreated branch April 20, 2026 13:31

Conversation

igerber commented Apr 19, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

igerber commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

igerber commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

igerber commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

igerber commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant